Jingwen REN/Dan ZHAO/Yining KONG/Shanwen TANG
2024-03-04
The Customer Shopping Preferences Dataset reveal the purchasing patterns and preferences of our clients, and encompasses a broad spectrum of customer attributes.
we aim to achieve core objectives:
Data-Driven Product Optimization
Strategic Marketing Innovations
Enhancement of Customer Experience
We are the Analytics Team in JD_KT Audit Company:
Jingwen REN
Dan ZHAO
Yining KONG
Shanwen TANG
shopping_trends_updated.csv
Feature: Number of observations( 3900 ) Number of variables( 18 )
shopping_trends.csv
Feature: Number of observations( 3900 ) Number of variables( 19 )
## Use SQLite function to join two tables
con <- dbConnect(RSQLite::SQLite(), dbname = "your_database.db")
dbWriteTable(con, "shopping_trends_2_table", shopping_trends_2,overwrite = TRUE)
dbWriteTable(con, "shopping_trends_1_table", shopping_trends_1,overwrite = TRUE)
query <- "
SELECT
shopping_trends_2_table.*,
shopping_trends_1_table.Preferred_payment_method
FROM
shopping_trends_2_table
LEFT JOIN
shopping_trends_1_table
ON
shopping_trends_2_table.Customer_id = shopping_trends_1_table.Customer_id
"
result1 <- dbGetQuery(con, query)
apply(result1, 2, max, na.rm = TRUE) Customer_id Age Gender
"3900" "70" "Male"
Item_purchased Category Purchase_amount_usd
"T-shirt" "Outerwear" "100"
Location Size Color
"Wyoming" "XL" "Yellow"
Season Review_rating Subscription_status
"Winter" "5.0" "Yes"
Shipping_type Discount_applied Promo_code_used
"Store Pickup" "Yes" "Yes"
Previous_purchases Payment_method Frequency_of_purchases
"50" "Venmo" "Weekly"
Preferred_payment_method
"Venmo"
Customer_id Age Gender Item_purchased
Min. : 1.0 Min. :18.00 Length:3900 Length:3900
1st Qu.: 975.8 1st Qu.:31.00 Class :character Class :character
Median :1950.5 Median :44.00 Mode :character Mode :character
Mean :1950.5 Mean :44.07
3rd Qu.:2925.2 3rd Qu.:57.00
Max. :3900.0 Max. :70.00
Category Purchase_amount_usd Location Size
Length:3900 Min. : 20.00 Length:3900 Length:3900
Class :character 1st Qu.: 39.00 Class :character Class :character
Mode :character Median : 60.00 Mode :character Mode :character
Mean : 59.76
3rd Qu.: 81.00
Max. :100.00
Color Season Review_rating Subscription_status
Length:3900 Length:3900 Min. :2.50 Length:3900
Class :character Class :character 1st Qu.:3.10 Class :character
Mode :character Mode :character Median :3.70 Mode :character
Mean :3.75
3rd Qu.:4.40
Max. :5.00
Shipping_type Discount_applied Promo_code_used Previous_purchases
Length:3900 Length:3900 Length:3900 Min. : 1.00
Class :character Class :character Class :character 1st Qu.:13.00
Mode :character Mode :character Mode :character Median :25.00
Mean :25.35
3rd Qu.:38.00
Max. :50.00
Payment_method Frequency_of_purchases Preferred_payment_method
Length:3900 Length:3900 Length:3900
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Group customers according to age and the proportion of male and female in each group
Age_group <- function(age) {
if (age >= 18 & age <= 35) {
return("Youth")
} else if (age > 35 & age <= 60) {
return("Middle_age")
} else {
return("Elderly")
}
}
df <- data.frame(Age = shopping_trends$Age, Gender = shopping_trends$Gender)
df$Age_group <- sapply(df$Age, Age_group)
summary_df <- df %>%
dplyr::group_by(Age_group, Gender) %>%
dplyr::summarize(count = n(), .groups = "drop") %>%
dplyr::group_by(Age_group) %>%
dplyr::mutate(total_count = sum(count),
percentage = paste0(round(count / total_count * 100, 2), "%")) %>%
dplyr::ungroup() %>%
dplyr::arrange(Age_group, desc(Gender)) %>%
dplyr::select(-total_count)| Age_group | Gender | count | percentage |
|---|---|---|---|
| Elderly | Male | 495 | 68.46% |
| Elderly | Female | 228 | 31.54% |
| Middle_age | Male | 1259 | 67.54% |
| Middle_age | Female | 605 | 32.46% |
| Youth | Male | 898 | 68.39% |
| Youth | Female | 415 | 31.61% |
Result
The Middle age Group (35-60) has the larest number(including male and female).
The proportion of male is larger than female in all groups.
The most popular category, and the three largest items proportion in this category
# 2.2 The three largest items proportion in Clothing
items_category <- dplyr:::select(shopping_trends,Item_purchased, Category)
items_category_1 <-items_category %>%
dplyr:::filter(Category == "Clothing")%>%
group_by(Item_purchased)%>%
summarise(count = n())%>%
slice_max(order_by=Item_purchased,
n=3)
items_category_1 %>%
gt()| Item_purchased | count |
|---|---|
| T-shirt | 147 |
| Sweater | 164 |
| Socks | 159 |
Result
T-shirt, Sweater, Socks are the largest items in the largest category.
The Season correlate with the purchase amount
shopping_trends$Season <- factor(shopping_trends$Season, levels = c("Spring", "Summer", "Fall", "Winter"))
ggplot(shopping_trends) +
aes(x = Season, y = Purchase_amount_usd) +
geom_boxplot(fill = "#112446") +
labs(x = "Season", y = "Purchase amount USD", title = "Season vs. Purchase Amount ") +
theme(axis.text.x = element_text(size = 15))+
theme_minimal()The most common shipping type for customers with a review rating above 4
common_type <- shopping_trends %>%
filter(Review_rating > 4) %>%
group_by(Shipping_type) %>%
summarise(n = n()) %>%
arrange(desc(n)) %>%
slice(1)
common_type %>%
gt()| Shipping_type | n |
|---|---|
| Standard | 272 |
Result
The most common shipping type for customers with a review rating above 4 is Standard.
Best Color sold in each Season
season_color_sales <- shopping_trends %>%
group_by(Season, Color) %>%
summarise(Count = n())
top_colors <- season_color_sales %>%
group_by(Season) %>%
top_n(1, Count)
top_colors <- top_colors[match(c("Spring", "Summer", "Fall", "Winter"),
top_colors$Season), ]
top_colors %>%
ungroup() %>%
gt()| Season | Color | Count |
|---|---|---|
| Spring | Olive | 52 |
| Summer | Silver | 59 |
| Fall | Magenta | 50 |
| Winter | Green | 50 |
The average purchase amount for customers who have made more than 30 previous purchases
avg_purchase <- mean(
shopping_trends$Purchase_amount_usd[
shopping_trends$Previous_purchases > 30
],
na.rm = TRUE
) %>%
round(digits = 2)
print(avg_purchase)[1] 60.03
Result
The average purchase amount is 60.03.
The average purchase amount for customers who have a subscription and used Venmo as the payment method
# Filter for customers with a subscription and who used Venmo
filtered_data <- shopping_trends %>%
filter(Subscription_status == "Yes", Payment_method == "Venmo")
# Calculate the average purchase amount
average_purchase_amount <- filtered_data %>%
summarise(AverageAmount = mean(Purchase_amount_usd))
#print(average_purchase_amount)Result
The average purchase amount for this customer group is 57.5 USD.
Locations with stronger purchase power and more subscribed members
## Filter the subscribed members
subscribed_members <- shopping_trends %>%
filter(Subscription_status == "Yes")
## TOP6 Locations with stronger purchase power
top6_purchase_power <- subscribed_members %>%
group_by(Location) %>%
summarise(Total_Purchase_Amount = sum(Purchase_amount_usd)) %>%
top_n(6, Total_Purchase_Amount)
## TOP6 Locations with most Subscription members
top6_members_count <- subscribed_members %>%
group_by(Location) %>%
summarise(Members_Count = n_distinct(Customer_id)) %>%
top_n(6, Members_Count)
## bar charts
top_purchases_plot <-
ggplot(top6_purchase_power,
aes(x = reorder(Location, -Total_Purchase_Amount),
y = Total_Purchase_Amount, fill = Location)) +
geom_bar(stat = "identity") +
geom_text(aes(label = sprintf("%.2f", Total_Purchase_Amount)),
stat = "identity",
position = position_stack(vjust = 0.5),
colour = "white", size = 2) +
scale_fill_brewer(palette = "Set2", direction = 1) +
ggthemes::theme_fivethirtyeight() +
theme(text = element_text(size = 9),
plot.title = element_text(size = 9),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.key.size = unit(0.3, "cm"),
legend.title = element_blank()) +
labs(title = "Top 6 Locations by Total Purchase(USD)", x = "Location", y = "Total Purchase Amount (USD)")
top_members_plot <-
ggplot(top6_members_count,
aes(x = reorder(Location, -Members_Count),
y = Members_Count, fill = Location)) +
geom_bar(stat = "identity") +
geom_text(aes(label = sprintf("%0.0f", Members_Count)),
stat = "identity",
position = position_stack(vjust = 0.5),
colour = "white", size = 2) +
scale_fill_brewer(palette = "Set2", direction = 1) +
ggthemes::theme_fivethirtyeight() +
theme(text = element_text(size = 9),
plot.title = element_text(size = 9),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.key.size = unit(0.3, "cm"),
legend.title = element_blank()) +
labs(title = "Top 6 Locations by Members", x = "Location", y = "Members Count")
cat('</div>')Missouri ranks among the top in terms of member registrations, yet their purchasing power is not very strong. Oklahoma has relatively strong purchasing power (ranking 5th among the 50 states), but they do not have many registered members.
Increase exclusive member discounts: To enhance purchasing power, offer exclusive discounts and double points accumulation for members, encouraging consumption.
Enhance user engagement: By hosting exclusive events for members, increase their sense of participation and brand loyalty, thereby enhancing their willingness to purchase.
Referral reward program: Implement a referral reward program to encourage existing members to refer new users to register as members, attracting more users through word-of-mouth.
Coupons and trials: Provide coupons or free trials for new registered users to lower their initial purchase threshold.
Here is our work division among four group members
https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset/data
https://quarto.org/docs/presentations/revealjs/
https://bookdown.org/yihui/rmarkdown/markdown-syntax.html